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THE STATISTICAL METHOD IN PROBLEMS OF WATER 
SUPPLY QUALITY 

By Abel Wolman, Maryland State Department of Health 



INTRODUCTORY 

The concept of water supply quality has the simplicity of the 
unknown to the layman, but the complexity of the universe to the 
sanitarian. If one uses the mathematician's measure of the complexity 
of a function — the number of its attributes — the problem of water 
supply quality, a function dependent upon mutually active natural, 
physical, chemical and biological phenomena, offers an attractive 
field of study to the statistician. For the professional statistician has 
been concerned always with "quantitative data affected by a multiplic- 
ity of causes" 1 and with their elucidation. In considering the causes 
operating to produce relatively good or bad waters, such as rainfalls, 
pollution, purification, etc., and their interpretation upon the basis of 
laboratory findings and personal surveys it becomes manifest that 
problems of water supply quality fall well within the scope of statistical 
method. Just as in all statistical problems, so in that of water supply 
quality, the investigator is confronted with the two-fold task of deter- 
mining the method of evaluating the units of interpretation and of 
defining the limiting values of such units. The method of approach 
to each problem involves a statistical viewpoint, as well as a quantita- 
tive methodology. The present paper has been prepared in order to 
illustrate, in as brief terms as possible, this statistical method of ap- 
proach, by developing therein a few examples of its application to the 
question of water supply quality. The writer plans to trace the 
evolution of the concept of water supply quality in the sanitarian's 
mind and to point out in such a development the function which the 
statistical art has performed or may be expected to supply in the 
future. The discussion appears to be a necessary one since hitherto 
the water supply investigator has been accused of an aversion for the 
quantitative sciences, while, on the other hand, the professional statis- 
tician has shown a neglect of a field which perhaps did not appear to 
be worthy of his mettle. The present study may serve to remove this 
friendly distrust which retards in a degree progress in critical studies 
of water supply quality. 

I. THE LABORATORY EXAMINATION OF WATER SUPPLIES 

The sanitary quality of water supply must be predicated necessarily 
upon the demonstration of its relative inability to produce disease. 
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With the present germ theory of disease, such a demonstration resolves 
itself into the laboratory problem of enumerating the number and 
types of pathogenic organisms in stated quantities of water. It is 
apparent, therefore, that the technique in this instance is largely 
bacteriological, and the discussion, for purposes of simplicity, may be 
restricted to the problems of the evaluation of bacterial units, as illus- 
trative of the statistical method. 

It is manifestly impossible and impracticable to examine an un- 
known water in such manner as to determine its content of all kinds of 
bacteria or even of those relatively few classes of specific organisms 
which it is known are both disease-producing and capable of living in 
water. It is more desirable as well as convenient, therefore, to choose 
one family or group of micro-organisms whose natural habitat and 
life history are similar to the variety of pathogenic organisms and 
whose detection by laboratory methods is most simple and speedy. 
Bacteriologists have concluded that a particular class of bacteria 
serves as the most convenient index to water supply quality or con- 
tamination. They have chosen as this index class or type the so-called 
colon or bacillus coli group. The B. coli group has been so selected, 
because its origin is in general the colon or digestive tract of man and 
its presence is usually indicative of human sewage pollution (the 
possible and probable existence of colon types in other environments 
need not concern us at this point). 

One of the primary objects of the bacteriologist, therefore, is to 
differentiate the bacterial species present in a water supply, so as to 
demonstrate the presence or absence of members of the B. coli group. 
In addition, it is necessary to obtain some idea of the relative frequency 
of such a group, since smaller numbers naturally connote a more 
remote pollution, due to the dying off of bacteria in the unfavorable 
environment of water, to the presence of antagonistic life, and to other 
natural and artificial barriers to its development. The problems 
arising in the laboratory differentiation of bacterial varieties offers, 
therefore, material for an initial example. Two general methods of 
distinguishing groups of bacteria are available. Both are based upon 
the method of differences. In the one case, morphological or structural 
characteristics, and in the other, metabolic distinctions control. 
Various classifications of the colon group, for instance, are based upon 
its ability to produce acid and gas from fermentable substances. 
Investigators have observed that certain types of B. coli ferment 
such complex organic compounds as sucrose, dulcitol and raffinose 
while others do not. Differences in the amount and character of gas 
formation from certain substances distinguish other types of bacteria. 
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In all classifications, however, it has been recognized that the same 
group may have a variety of reactions which overlap partially those 
of other groups. Two types of bacteria, for instance, may both fer- 
ment sucrose, but may differ in their effect upon a second or third 
compound. This gives rise naturally to a vast amount of possible 
combinations between characters and Levine 2 points out that "as 
the number of fermentable substances increases, the number of varie- 
ties increases geometrically approaching infinity. The number of 
'varieties' is given by the formula 2" where V is the number of char- 
acters studied. Thus with 8 characters there are 256 possible com- 
binations; this number rises to 1,024 with 10 characters and to 65,536 
when 16 characters are observed. The absurdity of regarding each 
character as of similar and equal differential value is thus evident." 

Levine, as well as other more recent investigators, has concluded 
that the principle of the correlation of characters should be emphasized 
in the attempt to distinguish bacterial species. He points out that 
certain properties have been universally accepted, after long checking, 
as reliable evidences of bacterial differences. Among such properties, 
he enumerates the selective dyeing of bacteria, their powers of spore 
formation, and their adaptation to aerobic or anaerobic development. 
The taxonomic value of the characters of motility, indol formation, 
and fermentation of certain compounds, on the other hand, he assumes 
to be still debatable. In order to avoid the adoption of a confusing 
classification of bacteria upon the basis of every character studied (of 
which we have indicated only a few) he has recourse to a basis of sub- 
division "on that character which gives the greatest amount of infor- 
mation as to the manner in which the resulting sub-groups react with 
respect to other characters. " 2 By making use of the above principle 
Levine evolves a classification of coli-like bacteria which is based almost 
completely upon statistically evaluated correlated characters. For the 
purpose of this study, he recognizes two main strains of bacteria, the 
B. coli and the B. aerogenes-cloacae group, which earlier investigations 
have shown to be distinguishable most often by their reactions to 
methyl-red and to the Voges-Proskauer reagent. The first strain is 
usually methyl-red positive and Voges-Proskauer negative, while the 
second strain shows the reverse. The justification of this initial sub- 
division into two main groups consists in the fact that the strains thus 
subdivided show end products of carbohydrate fermentations of two 
entirely distinct kinds. 

Levine's procedure consists in tabulating all of the reactions of the 
organisms studied in each of the above two groups in two different 
tables, from which are calculated the coefficients of correlation for each 
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pair of characters. He selects, then, for subdivision that character 
which gives the highest coefficient of correlation with the greatest 
number of other characters. For these resulting sub-groups new corre- 
lation tables are prepared and further subdivision is made. These 
sub-groups are regarded as species and each is assigned its name. 

In order to illustrate Levine's use of the coefficient of correlation for 
taxonomic purposes, let us follow his procedure in the subdivision of 
the B. coli, or methy-red positive and Voges-Proskauer negative, 
group of bacteria. For the 182 strains of this group that were studied 
by means of microscopic and metabolic methods, the coefficients of 
correlation shown in Table I were obtained. 



TABLE I. 
COEFFICIENTS OF CORRELATION OBTAINED FROM PAIRS OF CHARACTERS 
AMONG 182 STRAINS OF THE B. COLI GROUP 



Motility 


Indol 


Sucrose 


Raffinose 


Dulcitol 


Glycerol 




— .39 


+ .53 


+ .43 


+ .53 


+ .18 


— .39 




+ .08 


+ .00 


+ .02 


— .28 


+ .53 


+ .08 




+ .99 


+ .58 


— .38 


+ .43 


+ .00 


+ .99 




+ .58 


— .29 


+ .53 


+ .02 


+ .5S 


+ .58 




— .21 


+ .18 


-.28 


— .38 


— .29 


— .21 




+ .40 


+ .76 


+ .20 


+ .27 


+ .60 


+ .52 



Salicin 



Motility 
Indol . . . 
Sucrose . . 
Raffinose 
Dulcitol . 
Glycerol 
Salicin . . 



+ .40 
+ .76 
+ .20 
+ .27 
+ .60 
+ .52 



Since Levine's criterion for the choice of a character for subdivision 
is that that character should give the highest coefficient of correlation 
with most other characters, it is apparent, from an inspection of Table 
I, that sucrose, raffinose, dulcitol, and salicin meet this criterion more 
completely than do other properties. For special technical reasons, 
Levine chooses sucrose for primary division of the B. coli group and 
obtains by differentiation on sucrose ninety-three strains of the sucrose 
positive and eighty-nine strains of the sucrose negative groups. These 
two groups combined form, of course, the total of 182 strains initially 
chosen for study. Further study of the sucrose positive strains dis- 
closes a series of coefficients of correlation of characters as shown in 
Table II. 

TABLE II. 
COEFFICIENTS OF CORRELATION FOR PAIRS OF CHARACTERS AMONG 93 SU- 
CROSE POSITIVE STRAINS OF THE B. COLI GROUP 





Motility 


Indol 


Dulcitol 


Glycerol 


Salicin 


Motility 

Indol 

Dulcitol 

Glycerol 


— .27 
+ .67 
+ .40 
+ .54 


— .27 

+ .05 

— .42 
+ .28 


+ .67 
+ .05 

— .32 
+ .39 


+ .40 

— .42 

— .32 

+ .32 


+ .54 
+ .28 
+ .39 
+ .32 







192 American Statistical Association [52 

Table II indicates that motility is the best correlated character and 
this property provides, therefore, for two further sub-groups, a sucrose- 
positive motile sub-group and a sucrose-positive non-motile group. 
These sub-groups are treated in the manner already illustrated ahd the 
coefficient of correlation for different characters provide for further 
subdivision. With the aid of this statistical interpretation of his stud- 
ies of 333 coli-like bacteria, isolated from various sources, Levine sug- 
gests a classification of bacterial varieties. The summary of this 
classification need not be repeated here, since the reader is interested 
more in his method of attack than in the resulting bacteriological 
findings. 

Such classifications as Levine's supply the sanitarian with the quali- 
tative information necessary for the interpretation of one phase of the 
water supply quality problem.* The analyst dealing with waters is 
concerned not only with the nature of the bacterial types present 
therein, but also in the magnitude of their content, since it is the latter 
which indicates the degree and the remoteness of pollution. In the 
search for a potable water, it is often useless to seek that water which 
has no possible source of contamination, but it is always necessary to 
determine the quantitative bacterial importance of the latter. The 
methods so far described answer only one question, that is, what types 
of bacteria are present in the water. In the solution of the second 
inquiry, regarding the number of a particular type in a stated quantity 
of water, statistical method has played recently an important part. 

In the simpler tests for the B; coli group in waters, the so-called 
fermentation-tubes are used. These tubes contain the medium 
selected for most efficient differentiation of the B. coli group from other 
kinds of bacteria and are inoculated with specific quantities of the water 
to be tested. The production of gas in the tubes after stated periods 
of incubation indicates the presence of the B. coli group. Our knowl- 
edge that of five tubes, each inoculated with 0.1 c.c. of the water, four 
show the presence of the organism, is of value, but more important is 
the additional fact that such a series of findings indicates that the prob- 
able number of organisms in the sample tested is about 1,600 per 100c. c. 
This conversion of qualitative fermentation-tube results into quanti- 
tative values is of special interest to the statistician. 

In 1915, McCrady 3 showed that "the frequency of the appearance of 
the fermenting organism in the volume drawn from the sample for the 
test is an exponential function of the number of such organisms in the 
sample," and that "every fermentation-tube result, whether simple or 
compound, corresponds to one most probable number of organisms." 

*The subdivisions Levine develops have their importance to the investigator in the fact that species 
or varieties appear to be somewhat correlated with habitat or source of pollution. 
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By employing the theory of probabilities, he demonstrates that, given 

p 
the result " in 1 volume," for instance, the corresponding most 

P + Q 

probable number is given by the solution for x of the equation 

\ V I p+q 

Thus, for the result "five out of ten tubes positive in 1 c.c," the most 

probable number is given by solution of the equation 1 — .99* = 5/10, 

since V — 100 c.c, assumed as the original quantity of water sampled. 

The equation being solved, a; = 69 or the most probable number of B. 

coli in the sample, per 100 c.c. 

For compound results, such as — - — - in 10 c.c, in 1 c.c, a 

p+q r+s 

more complicated formula is employed which is built up, as follows: 3 
For the result in 10 c.c. the equation becomes (p+q) (log . 9) 

= — which is obtained by differentiating for a maximum the 

1 - . 9* 

equation given in the earlier paragraph for the probability of the 

results. 

7) r 
If the result is in 10 c.c, in 1 c.c, the equation stands 

p+q r+s 

(p+q) (log .9) + (r+s)(log ■")= i_ o* 1-99* 

where (p+q) = number of tubes inoculated with 10 c.c. of sample 
(r+s) = number of tubes inoculated with 1 c.c of sample 

x = number of fermenting organisms in 100 c.c. of sample 

p and r = number of tubes giving positive results in 10 and 1 c.c. 

respectively. 

If lower additional quantities of water are tested, extra similar 

terms are added to each side of the above equation. This equation 

has been modified by Wolman and Weaver 4 into 

lOOp lOr 



100(p+g) + 10(r+s) = 



1-.9* l-.99 x 



since, approximately, log .9 = 10 log .99 = 100 log .999. 

McCrady published later 5 a series of tables for the rapid interpre- 
tation of these results which makes the standardized use of the prob- 
able numbers of B. coli possible for the water supply investigator.* 

*The assumption of McCrady that the distribution of B. coli is similar to that in a mixture of a few 
red balls with many white balls is to be contrasted with the hypothesis of other workers that bacteria 
are uniformly distributed in water (G. C. Whipple 13 ). More recent independent investigators, however, 
confirm McCrady's assumptions. 
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The work of McCrady was followed by other investigations dealing 
with the numerical interpretations of B. coli tests, of which the more 
important are Stein 6 , 7 , 8 , Greenwood and Yule 9 , and Wells 10 , ", I2 
The results of Stein and Greenwood and Yule, although differing in 
technique and in additional interesting viewpoints, are in substantial 
agreement with those obtained by McCrady. Stein 8 adds consider- 
able interesting statistical material to the B. coli problem by introduc- 
ing the so-called B. coli factor method, in which he considers the most 
probable number of B. coli per c.c. from the percentage of positive 
tests, the expected error of results, the study of the distribution of coli 
during a series of tests, and the "coli characteristic" which attempts to 
show by one figure, the average coli, the expected error and the variable 
distribution. 

The discussion of the problem by Greenwood and Yule 9 has all the 
intricacy and mathematical complexity usually associated with Yule's 
contributions. Their findings, however, agree with those of McCrady 
and Stein. Greenwood and Yule, for instance, give as their formula 
for the number of B. coli per c.c, when using several tubes with 1 c.c. 

each tj v o o i V+Q 
x = ii. con per c.c. =2.3 log 

q 

whereas McCrady gives for the same condition (using an original size 
sample of 1,000 c.c.) 

log — — log — — log — - — 

x = P+q = 2+1 = ?+g = -2.3 1o g q 



1 ,000 log .999 - 1 ,000 ( . 0004344) - . 4344 P + q 

= 2.3 log 2+1 

q 

Perhaps the mathematician's interest may be aroused to the sani- 
tarian's problems of water supply by the mere examination of Green- 
wood and Yule's discussions, while the bacteriologist may view with 
some alarm the same paper. It should be postulated in either case, 
however, that superficial considerations should not prevent the mutual 
aid which these two branches of science may extend to each other. 
While such complexity of treatment of the numerical interpretation of 
fermentation-tube tests as is indicated by the formula 

\ mi / \m2 



,_/:[< 



hmtul i p—hm \ _ p—hamil i — hai 

/hi T / \m\ I \mi 



"Jl-e- ha nY n ] ( 



e-K-nM-e-KJ Adh 
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may attract the statistician, it is hoped that it may not at the same 
time deter the laboratory technician from the adoption of devices 
which provide for more adequate solutions of his problems. Emphasis 
must be placed upon the fact that the mental attitude resulting from 
the adoption of statistical method has much promise in a field of 
endeavor where laboratory findings are too infrequently tested for 
accuracy of interpretation and rarely treated as examples of mass 
phenomena. The work of such men as Stein and McCrady has done 
much to introduce such methods by clarifying our concepts of fermen- 
tation tube results and their relative significance. 

That the statistical method is an important asset in the exposition 
of laboratory findings is illustrated in another series of studies of various 
phases of water supply. Whipple 13 , for instance, has demonstrated 
that "if, in a series of daily observations of the number of bacteria in 
a filter effluent extending over a year the deviation of any determination 
from the mean should be found to be more than five times as much as 
the probable error, to use a round number, this should be rejected from 
the series as being, for some reason or other, abnormal." He has 
made important contributions to the study of the frequency distribu- 
tions of measures of various bacteriological, biological, and chemical 
characteristics of water, such as the preliminary finding that extended 
series of filter effluent results follow definite statistical laws in their 
distribution. His conclusion has been further substantiated by the 
more recent study of Wolman 14 of thousands of laboratory findings, 
in which it is indicated that the logarithms of bacterial counts, through 
long periods of time, have the characteristic normal probability dis- 
tribution of more familiar biological statistical data. 

It is of considerable interest to refer at this point to a form of graph 
presentation of data developed by a sanitary engineer which may be 
unfamiliar to most statisticians. Allen Hazen 16 in 1914 devised a 
form of chart ruled with a horizontal scale so divided tha the curve of 
probability would plot thereon as a straight line. Any series of ob- 
servations, therefore, which varied in accordance with the probability 
law would plot also as a straight line. Illustrations of the use of such 
paper in water supply problems may be found in the original paper of 
Hazen 16 and in subsequent discussions by Whipple 13 and Wolman 14 . 

Stein 16 , in his study of the bacterial count in water and sewage, has 
added considerable material to our conceptions of the variability of 
laboratory findings and their importance in practical studies. He has 
concluded, after an interesting detailed analysis of the problem, that : 

(a) For platings of a single sample of water, the mean error is equal 
to the square root of the number of colonies on a single plate, or the 
square root of the average number of colonies on several plates. 
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(b) The variations to be expected for careful and accurate work with 
bacterial counts are indicated by : 

(1) Standard Deviation of ± 12% 

(2) Deviation (1 in 10 times) of ± 25% 
For ordinary routine work: 

(1) Standard Deviation of ± 25% 

(2) Deviation (1 in 10 times) of ± 50% 

His comparison of the characteristics of bacteriological data with 
certain mathematical series should be of interest to the reader, since he 
shows, for example, that for daily tests of Lake Erie water for one 
month the Lexian Ratio is 29.00 and the Disturbancy Coefficient 124.00, 
while the corresponding values for a normal mathematical series (Ber- 
nouilli) are given as 1.00 and 0.00 respectively. 

II. THE INTERPRETATION OF THE QUALITY OF WATER SUPPLIES 

In preceding paragraphs the writer has indicated a few of the problems 
encountered in the laboratory technique of water supply examination, 
which lend themselves to statistical treatment. It has been impossible 
to include in the present brief paper any complete survey of such appli- 
cations to other phases of laboratory procedure, but sufficient material 
has been presented, to demonstrate that the data in the field of labora- 
tory technique have considerable to offer to the professional statistician 
as bases for the development of interpretative principles of quality. 

The writer believes that some mention should be made briefly of 
certain interesting possibilities of development in the application of 
statistical method to general problems of laboratory procedure. The 
use of the coefficient of partial correlation, for instance, does not appear 
to have been introduced widely in the interpretation of laboratory find- 
ings, yet the necessity for its application is most apparent. Often 
investigative work in water supplies is carried out on a large or plant 
scale with the aid of analytical laboratory methods. In the study of 
the chlorination of a water supply, for example, a number of different 
variable quantities such as turbidity, color, organic content, and bac- 
terial densities have their effect in modifying the efficiency of the dis- 
infection process. In practically all conclusions from such studies no 
attempt is made to determine mathematically the effect of such varia- 
bles, other than by mere inspection of tabulated data. There is little 
doubt that erroneous conclusions are often obtained through the failure 
to evaluate quantitatively the importance of fluctuations in the various 
characteristics of waters subject to chlorination. It is almost impossi- 
ble to determine by qualitative inspection of a series of daily observa- 
tions, over an entire year, of temperature, turbidity, color, organic 
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content, and bacterial density in a water supply, whether the effect of 
a constant dosage of chlorine is influenced more greatly by any one of 
the above characteristics or by a combination of several or all of them. 

The same problem arises, of course, in the study of any of the phe- 
nomena associated with the purification of water supplies. In the co- 
agulation of suspended matter in water, for instance, all the variables 
such as time, agitation, temperature, hydrogen-ion concentration, 
nature of suspended matter, and character of coagulant play an inter- 
connected part. The principle of partial correlation could be adapted 
with profit to these problems of associated phenomena. 

The application of such a statistical principle as pointed out above 
is complicated, however, by the fact that the more simple statistical 
coefficients usually cannot be directly applied to the problems encoun- 
tered, on account of the fact that such measures presuppose the use of 
data having a symmetrical or Gaussian distribution, while the phe- 
nomena with which the sanitarian has to deal often are characterized 
by asymmetrical distributions. 17 , 18 

Michael 18 has discussed in this connection the determination of the 
most probable number of bacteria present in a sample and has demon- 
strated that it is not permissible to apply the probable error in the usual 
manner on account of the fact that the logarithms of the plate counts, 
and not the counts themselves, show a Gaussian frequency distribu- 
tion 19 . McEwen and Michael 17 in another field of investigation have 
been confronted with the same problem of determining the "functional 
relation of one variable to each of a number of correlated variables" 
where such variables do not show the usual symmetrical frequency 
distribution. It is manifestly impossible to extend in this paper the 
elucidation of these applications of statistical method to problems of 
laboratory and plant, but the reader may find profitable data in the 
original papers already noted. 

The opportunity for the application of statistical tests to problems of 
water supply quality is not restricted, however, to the materials of the 
analyst. The consideration of the potability of a supply involves 
always a series of mutually active attributes, each of which has its im- 
portance in determining the character of the water. The concept of 
quality connotes, therefore, a composite of properly weighted individ- 
ual and fundamental units, in the evaluation of which statistics again 
comes to the fore. 

It is unfortunate, however, that in the field of interpretation of 
quality statistical method has been even slower of application than in 
the corresponding study of laboratory data. The quantitative eval- 
uation of sanitary data has always given way to the liberal exercise of 
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expert personal judgment. Where a multiplicity of causes predeter- 
mines a phenomenon, such as quality, it was thought that a proper 
perspective was possible only through the development of a maturity 
of judgment in which the play of the manifold effects was qualitatively 
summarized rather than quantitatively analyzed. As the methods of 
diagnosis of quality developed, however, the opportunity for the fruit- 
ful application of the principles of mass phenomena gradually becomes 
apparent. With this development of a new viewpoint, good as well as 
evil sometimes resulted. A complete swinging of the pendulum to the 
quantitative side of interpretation was feared, where the attempt was 
made to substitute for individual experience and judgment pseudo- 
quantitative measures of doubtful significance. Some of these efforts, 
in which statistical laws frequently were ignored, will be discussed later 
in this paper. In general, however, a realization is gradually coming 
over the sanitarian that statistics as a means, rather than as an end, 
has much to offer in the clarification of his problems. If the succeeding 
pages seem somewhat bare, in their statistical implication, the pro- 
fessional statistician should remember that the concepts there dis- 
cussed mark the advance of a new light in sanitary engineering, which, 
though feeble in its flicker, gives promise of a greater brilliance in the 
not distant future. 

Attempts to formulate water supply standards of composite char- 
acter represented one of the earliest applications of semi-statistical 
method. Most of these were based upon the erroneous conclusion 
that methods of evaluating units had been standardized throughout 
the country. Attention has been called to this fallacy of endeavoring 
to establish limiting values of units attained by varying methods by 
Hinman 20 , Norton 21 , and Morse and Wolman 22 . Fundamental train- 
ing in statisticaHnterpretation no doubt would prevent the adoption 
of water supply quality standards before the principles of unit evalua- 
tion have been rigidly enforced. 

It is not amiss, perhaps, to call attention at this point to the close 
analogy between the so-called scoring of a water supply, or the quanti- 
tative allocation of the quality upon the scale of sanitary safety, and 
the statistician's concept of index numbers. Wolman 14 has shown 
recently that the operations involved in making a price index number 
are similar to those followed, to a greater or less extent, by investiga- 
tors of water supply scores. In the case of price index numbers, the 
object of weighing is to give each commodity included in the index 
number an influence upon the results corresponding to its commercial 
importance. In water supply index numbers, the object of weighing 
likewise is to give each factor making up the score an influence upon 
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the results corresponding to its sanitary importance. Although the 
problems in the two fields are the same, their solutions are necessarily- 
different, since, in the case of water supply scores, the conversion to a 
common base of such units as bacterial results, sanitary surveys, opera- 
ting efficiencies, etc., cannot be carried out because of the presence of 
varying personal opinion or judgment. It has been noted 14 , however, 
that it still remains possible to make use nationally of simplified index 
numbers of water supply quality restricted in their range of signifi- 
cance and composed of similar units or, better still, of individual units, 
provided the method of evaluation of such units has been definitely 
and completely fixed. 

Interpretations of the quality of a water include frequently more 
than a summary of the structural and environmental features of the 
supply. The possibilities of the intelligent and fruitful application of 
statistical devices, such as the coefficients of correlation and of varia- 
tion, to other phases of water supply are mentioned only briefly here, 
since their complete discussion would involve a paper of a far too great 
length. Whipple* for instance, has suggested the use of the coefficient 
of correlation in analyzing the vital statistics of cities which have made 
changes from poor to good quality water supplies, in order to demon- 
strate quantitatively the existence of the Mills-Reincke phenomenon. 
Hazen 15 has made excellent use of statistical method in his analysis of 
the storage provided in an impounding reservoir on any stream and 
the quantity of water which can be supplied continuously by it. He 
introduces the coefficient of variation as a measure of the degree of 
variation in flows of different streams and by its further use has found 
it possible to get an approximate expression for the storage required to 
carry the surplus water of wet years over to dry years, which expres- 
sion, in general terms, applied equally well to streams in different 
localities. In addition, he describes methods of estimating the proba- 
ble errors in the results obtained and makes the important comment 
that "frank recognition of the large probable errors in many of the 
results cannot fail to be advantageous." 15 

The opportunities for further application of similar methods have 
appeared in the present writer's studies of the correlation of bacterial 
contents in water supplies with rainfalls upon stream watersheds and 
with hygienic resultants of inferior quality such as typhoid fever and 
diarrhoeal diseases. In these particular studies, the statistician could 
contribute excellent aid, since the writer is not aware of an effective 
method of comparing correlated phenomena in which one series of 
characteristics is continuous, while another is discontinuous. In addi- 

* Personal communication. 
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tion, quantitative variations in magnitude of the values in both series 
are not of paramount importance, but the direction of such variations 
is the interesting event. The coefficient of concurrent deviations in 
this instance, does not appear to supply all the desiderata. An exam- 
ple may make our problem clearer. In the study of the daily tap 
water analyses of a city water supply, we find, by inspection, that the 
B. coli contents rise after rains on the watershed of the stream supply- 
ing the town. It is also found that such rises are masked, to varying 
degrees, by purification processes and by the efficiency of operation 
of such processes. If changes in method and efficiency of purification 
are brought about and the qualitative reflection of rainfalls in resultant 
B. coli density in tap waters is modified, how can we measure quantita- 
tively the change in sensitiveness of tap water quality to rainfall from 
month to month? The data at hand for this purpose, reduced to 
simplest terms, are in each month B. coli values for each day (continu- 
ous series), which differ in density from day to day, and rainfall records 
(discontinuous series) which may give a zero value for all the days but 
three or four during the month. If, during the month of July, the B. 
coli per 100 c.c. rose from 2 to 2,000 from July 7 to July 8, following a 
rain of 0.8 inch on the stream on July 7, and during August the B. coli 
per 100 c.c. showed no jumps above 5 in spite of a number of days of 
rainfall of about 0.8 inch, what should be the statistical relation be- 
tween the months of July and August for these particular considerations? 

This paper should not be concluded without some reference to the 
part that the study of purification processes has played in modifying 
and determining the quality of water supplies and the importance 
therein of the mathematician's tools. It is frequently the sanitarian's 
problem to include in his valuation of a water's safety some definite 
estimate, among other things, of the efficiency of operating features 
involved in the treatment of such a supply. This problem has given 
rise to various measures of treatment efficiencies, which only recently 
have been subjected to rigid statistical study. As an illustration of 
this type of measure the percentage removal of bacteria from untreated 
to treated waters has persisted. Statistical objections to this measure 
are well known to the reader and substitutes for this measure of per- 
formance, and indirectly of quality, have been much sought after. It 
was long recognized that the real measure of performance should in- 
clude data regarding the distribution of the efficiencies over long periods 
and recommendations suggesting the classification of bacterial results 
according to frequency distributions have done much to clarify the 
interpretation of treatment figures. 

Further development of the same problem of plant performance 
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along statistical lines has been made by Wolman 23 , in the study of the 
nature of bacterial removal in filtration plants. In this discussion, it 
was suggested that "the normal performance of a water filtration plant 
may be represented by a curve having the equation: y = x c , where 
y and x are respectively the raw water and final effluents counts, and c 
is a constant for the particular plant under discussion." In other 
words, the tentative hypothesis was brought forth that the final efflu- 
ent count, on the average, is an exponential function of the raw water 
count. The evaluation of "c" replaces also the unsatisfactory per- 
centage efficiency as a more adequate measure, by using the ratio of 
the logarithms of the counts instead of the ratio of the actual bacterial 
values. 

It is apparent that a measure of performance to be effective for 
adaptation to quality interpretation should include more than an array 
of its daily values, since it is the consistency of bacterial removal which 
predetermines the position of a form of treatment in the scale of the 
safety of a supply. Heretofore, no single unit of measure of this degree 
of consistency of removal has been available, although the fitting of 
normal performance data to the logarithmic curve of filtration supplied 
at least a graphic method of testing consistency. 2 ? If bacterial data 
are arranged and plotted on the probability paper already referred to 
in the discussion, it becomes extremely easy to obtain the values of the 
semi-interquartile ranges of the figures in successive steps of purifica- 
tion. The ratio of such values of the ranges for any two steps appears 
to the writer to present some promise of a real measure of the "level- 
ling" effect of purification processes, since it measures the change pro- 
duced in the frequency distribution of bacteria in passing through the 
treatment. The demonstration of its value may be more apparent to 
the reader by reference to material given elsewhere. 24 
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